Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set IO to be sync for File by default #15236

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jgaskins
Copy link
Contributor

Since userspace buffers prevent Crystal from closing files automatically on program exit, maybe we should avoid using them for file I/O by default. Folks can enable userspace buffering explicitly if they need its benefits.

Fixes #13995

@jgaskins jgaskins force-pushed the blocking-file-io-by-default branch from 737afc1 to e764834 Compare November 28, 2024 18:37
@jgaskins jgaskins changed the title Set IO to be blocking for File by default Set IO to be sync for File by default Nov 28, 2024
@yxhuvud
Copy link
Contributor

yxhuvud commented Nov 28, 2024

Somewhat meta: The wording around this is so bad. Why on earth is the base verb used sync and not buffer or even write_buffer?

That said I agree with the basic premise of matching the promises of the underlying OS by default, or at least to not be less safe than the OS. That way people will hopefully get less bad surprises even if things are too complicated in general: https://transactional.blog/how-to-learn/disk-io

Also with things like
https://lore.kernel.org/linux-fsdevel/[email protected]/t/#u on the horizon getting rid of complexity by default might make even more sense. But having easy access to buffering as an opt in option do still make a lot of sense (that option might also make sense for the buffered data, to avoid double buffering).

@jgaskins
Copy link
Contributor Author

Why on earth is the base verb used sync and not buffer or even write_buffer?

I've been wondering this for a while, too. I went to open an issue about it but someone already had. #9023 😄

@jgaskins
Copy link
Contributor Author

Also, the spec failures appear to be due to what @straight-shoota mentioned here. I think those specs should change, even if just to explicitly set sync = false, because they depend on File defaulting to it.

@RX14
Copy link
Contributor

RX14 commented Nov 29, 2024

Why should files behave differently to i.e. sockets here? If you call exit at a random point (i believe this is the only case that makes a difference) why would you expect all your file IO to complete but not your socket IO?

@yxhuvud
Copy link
Contributor

yxhuvud commented Nov 29, 2024

If you call exit

There is no difference from program terminating normally, no?

@jgaskins
Copy link
Contributor Author

Why should files behave differently to i.e. sockets here?

I don't understand the question. They already behave differently, and this PR aligns them better. To illustrate what I mean, this program does not write the string to the file:

file = File.open("output.txt", "w")
file.puts "this is some output"
sleep 100.milliseconds

This program sends data over a socket, reads from the other end of the socket, and writes to STDOUT what the TCP client sent:

require "socket"

spawn do
  server = TCPServer.new(1234)
  while incoming = server.accept?
    spawn puts incoming.gets
  end
end

socket = TCPSocket.new("localhost", 1234)
socket.puts "hi"

# Let the other fibers work
sleep 100.milliseconds

The first one doesn't need the sleep call since file I/O is blocking (side note: blocking vs sync is only a useful distinction to the tiniest subset of programmers — the terms are frequently used interchangeably), but I put it in there to show I had nothing up my sleeves. Neither program flushes or closes any IO objects before exiting.

The second program outputs to STDOUT. The first does not write to the file. And that doesn't make any sense.

@HertzDevil
Copy link
Contributor

If this is a concern, why would non-blocking, regular files still have buffered writes?

@jgaskins
Copy link
Contributor Author

If this is a concern, why would non-blocking, regular files still have buffered writes?

Files are blocking by default, aren't they?

blocking =
case system_info.type
when .pipe?, .socket?, .character_device?
false
else
true
end

@HertzDevil
Copy link
Contributor

Well let's say I indeed passed blocking: false explicitly when opening a File. This PR suggests that it is okay for those files not to be flushed on program exit.

@jgaskins
Copy link
Contributor Author

This PR suggests that it is okay for those files not to be flushed on program exit.

I'm very verbose in my writing. If I didn't write something, chances are I wasn't suggesting it, either.

What I did say in the very first sentence of this PR was that this is about the default behavior. The default settings are how people will use these most of the time, especially people who are less familiar with the language.

At the moment, I don't have many opinions for when people override the default settings, tbh.

@HertzDevil
Copy link
Contributor

Disabling write buffering when File.new is called with 1 out of 5 optional arguments being blocking: true is a default behavior. Disabling write buffering in File.new regardless of blocking's value is also a defualt behavior.

Why do you think the former is more appropriate than the latter?

@jgaskins
Copy link
Contributor Author

Disabling write buffering in File.new regardless of blocking's value is also a defualt behavior.

If this were true, wouldn't the File example in this comment write content to the file? Because that's not what's happening.

All I'm looking for is for Crystal to do the safe thing by default. Nothing more than that. I don't understand why there is so much pushback.

@HertzDevil
Copy link
Contributor

That example has nothing to do with my question. Maybe let me put it this way:

class File
  private def initialize(@path, fd : Int, blocking = false, encoding = nil, invalid = nil)
    self.set_encoding(encoding, invalid: invalid) if encoding
    super(fd, blocking)
  end
end

If I add self.sync = false to this constructor, that also disables write buffering on all Files by default. So why do you think passing blocking: false to File (not IO::FileDescriptor) does not count as part of this default?

@jgaskins
Copy link
Contributor Author

I can't tell for sure, but I feel like your questions are leading toward pointing out that you would prefer to split out blocking and sync as separate options. Is that where you're going with this?

@RX14
Copy link
Contributor

RX14 commented Nov 30, 2024

They already behave differently, and this PR aligns them better

My apologies, I thought Socket was sync = false by default. Perhaps a confusion caused by HTTP::Server::Response being sync = true by default.

@jgaskins
Copy link
Contributor Author

jgaskins commented Dec 1, 2024

My apologies, I thought Socket was sync = false by default.

No worries, when you asked I thought I might've been the one that was mistaken. I wrote up the example code because I didn't want to be an IRL Principal Skinner meme.

Am I out of touch? No, it is the maintainer who are wrong.

@crysbot
Copy link

crysbot commented Dec 2, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/using-wait-groups-with-http-server/7452/4

@straight-shoota
Copy link
Member

straight-shoota commented Dec 2, 2024

I'm sympathetic to disable output buffering for (ordinary) files by default. But I don't think this PR is the best way to do it.
First of all, this configuration should be considered a cross-platform default, so I'd prefer not to have it in platform-specific code.
Also with this change most file descriptors will be unbuffered by default, so it would seem a good idea to disable it globally in IO::FileDescriptor and only enable it explicitly for the few cases which should still use output buffering (if there are any).

Before jumping to a new implementation, I believe this topic may need a bit more discussion. Changing such a default behaviour must be explored deeply to understand its effects. Especially since this affects the conceptual premise of streams in Crystal (this is already the case for sockets of course, but files makes it even more an issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flush file output on program exit
7 participants